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ABSTRACT 

We consider the problem of fitting a parametric model to time-series data that are afflicted by correlated 
noise. The noise is represented by a sum of two stationary Gaussian processes: one that is uncorrelated in 
time, and another that has a power spectral density varying as l// 7 . We present an accurate and fast [0(N)] 
algorithm for parameter estimation based on computing the likelihood in a wavelet basis. The method is 
illustrated and tested using simulated time-series photometry of exoplanetary transits, with particular attention 
to estimating the midtransit time. We compare our method to two other methods that have been used in the 
literature, the time-averaging method and the residual-permutation method. For noise processes that obey our 
assumptions, the algorithm presented here gives more accurate results for midtransit times and truer estimates 
of their uncertainties. 

Subject headings: methods: statistical — techniques: photometric — stars: planetary systems 



1. INTRODUCTION 

Frequently one wishes to fit a parametric model to time- 
series data and determine accurate values of the parameters 
and reliable estimates for the uncertainties in those parame- 
ters. It is important to gain a thorough understanding of the 
noise and develop appropriate methods for parameter estima- 
tion, especially at the research frontier, where the most inter- 
esting effects are often on the edge of detectability. Underes- 
timating the errors leads to unjustified confidence in new re- 
sults, or confusion over apparent contradictions between dif- 
ferent data sets. Overestimating the errors inhibits potentially 
important discoveries. 

When the errors in the data are well understood and 
uncorrelated, the problem of parameter estimation is rela- 
tively straightforward (see, e.g., Bevington & Robinson 2003, 
Gould 2003, Press et al. 2007). However, when the noise 
is not well-understood — and particularly when the noise ex- 
hibits correlations in time — the problem is more challenging 
(see, e.g., Koen & Lombard 1993, Beran 1994). Traditional 
methods that ignore correlations often give parameter esti- 
mates that are inaccurate and parameter errors that are under- 
estimated. Straightforward generalization of the traditional 
methods is computationally intensive, with time-complexity 
0(N 2 ) in the worst cases (where N is the number of data 
points). This makes certain analyses impractical. 

Our specific concern in this paper is the analysis of time- 
series photometry of exoplanetary transits. During a transit, 
a planet passes in front of the disk of its parent star, which is 
evident from the slight diminution in the light received from 
the star. A model of a transit light curve may have many 
parameters, but we focus mainly on a single parameter, the 
midtransit time t c , for three reasons. The first reason is the 
simplicity of a single-parameter model. The second reason is 
that t c is a unique piece of information regarding each transit 
event, and as such, the accuracy cannot be improved by com- 
bining results from multiple transit observations. Instead one 
must make the most of single-event observations even if they 
are afflicted by correlated noise. The third reason is that tran- 
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sit timing offers a means of discovering additional planets or 
satellites by seeking anomalies in a sequence of transit times 
due to gravitational perturbations [Holman & Murray (2005), 
Agoletal. (2005)]. 1 

Beginning with the work of Pont, Zucker, & Queloz (2006), 
it has been widely recognized that time-correlated noise ("red 
noise") is a limiting factor in the analysis of transit light 
curves. Many practitioners have attempted to account for cor- 
related errors in their parameter estimation algorithms (see, 
e.g., Bakos et al. 2006, Gillon et al. 2006; Winn et al. 2007, 
2009; Southworth 2008). Among these schemes are the 
"time-averaging" method, in which the effects of correlations 
are assessed by computing the scatter in a time-binned version 
of the data (Pont et al. 2006) and the "residual-permutation" 
method, a variant of bootstrap analysis that preserves the time 
ordering of the residuals (Jenkins et al. 2002). 

In this paper we present an alternative method for param- 
eter estimation in the presence of time-correlated noise, and 
compare it to those two previously advocated methods. The 
method advocated here is applicable to situations in which 
the noise is well described as the superposition of two sta- 
tionary (time-invariant) Gaussian noise processes: one which 
is uncorrelated, and the other of which has a power spectral 
density varying as 1 // 7 . 

A more traditional approach to time-correlated noise is the 
framework of autoregressive moving average (ARMA) pro- 
cesses (see, e.g., Box & Jenkins 1976). The ARMA noise 
models can be understood as complementary to our 1 // 7 
model, in that ARMA models are specified in the time do- 
main as opposed to the frequency domain, and they are 
most naturally suited for modeling short-range correlations 
("short-memory" processes) as opposed to long-range cor- 
relations ("long-memory" processes). Parameter estimation 
with ARMA models in an astronomical context has been dis- 
cussed by Koen & Lombard (1993), Konig & Timmer (1997), 
and Timmer et al. (2000). As we will explain, our method 
accelerates the parameter estimation problem by taking ad- 
vantage of the discrete wavelet transform. It is based on the 

1 The transit duration is also expected to vary in the presence of additional 
gravitating bodies; see, e.g., Kipping (2009). 
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fact that a the covariance matrix of a l// 7 noise process is 
nearly diagonal in a wavelet basis. As long as the actual noise 
is reasonably well described by such a power law, our method 
is attractive for its simplicity, computational speed, and ease 
of implementation, in addition to its grounding in the recent 
literature on signal processing. 

The use of the wavelets in signal processing is widespread, 
especially for the restoration, compression, and denoising of 
images (see, e.g., Mallat 1999). Parameter estimation using 
wavelets has been considered but usually for the purpose of 
estimating noise parameters (Wornell 1996). An application 
of wavelets to the problem of linear regression with correlated 
noise was given by Fadili & Bullmore (2002). What is new in 
this work is the extension to an arbitary nonlinear model, and 
the application to transit light curves. 

This paper is organized as follows. In § [2] we review the 
problem of estimating model parameters from data corrupted 
by noise, and we review some relevant noise models. In § [3] 
we present the wavelet method and those aspects of wavelet 
theory that are needed to understand the method. In § |H 
we test the method using simulated transit light curves, and 
compare the results to those obtained using the methods men- 
tioned previously. In § [5] we summarize the method and the 
results of our tests, and suggest some possible applications 
and extensions of this work. 

2. PARAMETER ESTIMATION WITH "COLORFUL" NOISE 

Consider an experiment in which samples of an observable 
yi are recorded at a sequence of times {t, ■ : i = 1 , . . . , A^}. In the 
context of a transit light curve, y; is the relative brightness of 
the host star. We assume that the times f,- are known with neg- 
ligible error. We further assume that in the absence of noise, 
the samples y, would be given by a deterministic function, 

y(ti) = f(thPu---,PK) = f(thP), (no noise) (1) 

where p = {p\, . . . ,px} is a set of K parameters that spec- 
ify the function /. For an idealized transit light curve, those 
parameters may be the fractional loss of light 5, the total du- 
ration T, and ingress or egress duration r, and the midtransit 
time t c , in the notation of Carter et al. (2008). More realis- 
tic functions have been given by Mandel & Agol (2002) and 
Gimenez (2007). 

We further suppose that a stochastic noise process e(f) has 
been added to the data, giving 

y(ti) = f(ti\p) + e(ti). (with noise) (2) 

As a stochastic function, e= {e(fi), . . . e(f^)} is characterized 
by its joint distribution function V(e;q), which in turn de- 
pends on some parameters q and possibly also the times of 
observation. The goal of parameter estimation is to use the 
data y(ti) to calculate credible intervals for the parameters p, 
often reported as best estimates pk and error bars a Pk with 
some quantified degree of confidence. The estimate of p and 
the associated errors depend crucially on how one models the 
noise and how well one can estimate the relevant noise pa- 
rameters q. 

In some cases one expects and observes the noise to be un- 
correlated. For example, the dominant noise source may be 
shot noise, in which case the noise process is an uncorrelated 
Poisson process that in the limit of large numbers of counts is 
well-approximated by an uncorrelated Gaussian process, 

N 1 / e 2 \ 

Vie^m^^^—^i-^y (3) 



in which case there is only one error parameter, a, specifying 
the width of the distribution. 

If the noise is correlated then it is characterized by a joint 
probability distribution that is generally a function of all the 
times of observation. We assume that the function is a multi- 
variate Gaussian function, in which case the noise process is 
entirely characterized by the covariance matrix 

nt i ,tj)={e{t i )e{t j )). (4) 

Here, the quantity (e) is the mean of the stochastic function e 
over an infinite number of independent realizations. We fur- 
ther assume that the covariance depends only on the differ- 
ence in time between two samples, and not on the absolute 
time of either sample. In this case, the noise source is said to 
be stationary and is described entirely by its autocovariance 
R(t) (Bracewell 1965): 

R(T)=(e(t)e(t + T)). (5) 

The parameter estimation problem is often cast in terms of 
finding the set of parameters pk that maximize a likelihood 
function. For the case of Gaussian uncorrelated noise the 
likelihood function is 

N 1 / r 2 \ 

where r, is the residual defined as y, -/(f,-; p), and a is an es- 
timate of the single noise parameter a. Maximizing the like- 
lihood C is equivalent to minimizing the \ 2 statistic 

i 

In transit photometry, the estimator a of the noise parame- 
ter <7 is usually not taken to be the calculated noise based on 
expected sources such as shot noise. This is because the ac- 
tual amplitude of the noise is often greater than the calculated 
value due to noise sources that are unknown or at least ill- 
quantified. Instead, a is often taken to be the standard devia- 
tion of the data obtained when the transit was not occurring, 
or the value for which \ 2 = AW for the best-fitting (minimum- 
X 2 ) model. These estimates work well when the noise process 
is Gaussian, stationary, and uncorrelated. For the case of cor- 
related noise, Eqn. (f7]) is replaced by (Gould 2003) 

N N 

x 2 =EE r ^v> (8) 

The case of uncorrelated noise corresponds to £y = a 2 Sij. 

It is at this point where various methods for modeling cor- 
related noise begin to diverge. One approach is to estimate £ 
from the sample autocovariance R{t) of the time series, just as 
a can be estimated from the standard deviation of the resid- 
uals in the case of uncorrelated noise. However, the calcu- 
lation of x 2 nas a worst-case time-complexity of 0(N 2 ) and 
iterative parameter estimation techniques can be prohibitively 
slow. One might ameliorate the problem by truncating the co- 
variance matrix at some maximum lag, i.e., by considering the 
truncated \ 2 statistic 

N L 

x 2 0D = E E n(t- x ) m) r i+h (9) 

'=1 I=-L 
\<i+l<N 
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but in the presence of long-range correlations one needs 
to ret ain many lags to obtain accurate parameter estimates. 
(In § 14.31 we will give an example where 50-75 lags were 
needed.) Alternatively, one may model the autocorrelation 
function and therefore the covariance matrix using an au- 
toregressive moving-average (ARMA) model with enough 
terms to give a good fit to the data (see, e.g., Koen & Lom- 
bard 1993). Again, though, in the presence of long-range cor- 
relations the model covariance matrix will be non-sparse and 
computationally burdensome. 

Pont et al. (2006) presented a useful simplification in the 
context of a transit search, when data are obtained on many 
different nights. In such cases it is reasonable to approxi- 
mate the covariance matrix as block-diagonal, with different 
blocks corresponding to different nights. Pont et al. (2006) 
also gave a useful approximation for the covariance structure 
within each block, based on the variance in boxcar-averaged 
versions of the signal. Ultimately their procedure results in 
an equation resembling Eqn. (0 for each block, but where a 
is the quadrature sum of a w (the "white noise") and a r (the 
"red noise," estimated from the boxcar-averaged variance). In 
this paper, all our examples involve a single time series with 
stationary noise properties, and the net effect of the Pont et 
al. (2006) method is to enlarge the parameter errors by a fac- 
tor 

relative to the case of purely white noise (o> = 0). We will 
refer to this method as the "time-averaging" method. 

Another approach is to use Eqn. (0 without any modifica- 
tions, but to perform the parameter optimization on a large 
collection of simulated data sets that are intended to have the 
same covariance structure as the actual data set. This is the 
basis of the "residual permutation" method that is also dis- 
cussed further in § 14.41 As mentioned above, this method is 
a variant of a bootstrap analysis that takes into account time- 
correlated noise. More details on both the ti me-a veraging and 
residual-permutation methods are given in § 14.41 

Our approach in this paper was motivated by the desire to 
allow for the possibility of long-range correlations, and yet to 
avoid the slowness of any method based on Eqn. (0 or other 
time-domain methods. Rather than characterizing the noise in 
the time domain, we characterize it by its Power Spectral Den- 
sity (PSD) S( f) at frequency /, defined as the square of the 
Fourier transform of e(f), or equivalently, the Fourier trans- 
form of the autocovariance R(t). We restrict our discussion to 
noise sources with a PSD 

5(/)=^ (11) 

for some A > and spectral index 7. For the special case of 
uncorrected noise, 7 = and S(f) is independent of /. This 
type of noise has equal power density at all frequencies, which 
is why it is called "white noise," in an analogy with visible 
light. As 7 is increased, there is an increasing preponderance 
of low-frequency power over high-frequency power, leading 
to longer-range correlations in time. 

Noise with a power spectrum 1 // 7 is ubiquitous in nature 
and in experimental science, including astrophysics (see, e.g., 
Press 1978). Some examples of l// 7 noise are shown in 
Fig. Q] for a selection of spectral indices. In an extension of 
the color analogy, 7=1 noise is sometimes referred to as 
"pink noise" and 7 = 2 noise as "red noise." The latter is 



also known as a Brownian process, although not because of 
the color brown but instead because of the Scottish botanist 
Robert Brown. However, as we have already noted, the term 
"red noise" is often used to refer to any type of low-frequency 
correlated noise. 

Here we do not attempt to explain how 1 / f 1 noise arises 
in a given situation. Instead we assume that the experimenter 
has done his or her best to understand and to reduce all sources 
of noise as far as possible, but despite these efforts there re- 
mains a component of 1 / f 1 noise. In transit photometry these 
correlations often take the form of "bumps," "wiggles," and 
"ramps" in a light curve and are often attributed to differ- 
ential atmospheric extinction, instrumental artifacts such as 
imperfect flat-fielding, and stellar granulation or other astro- 
physical effects. The method presented in this paper is es- 
sentially a model of the likelihood function that retains the 
essential information in the covariance matrix without being 
prohibitively expensive to compute and store. It is based on a 
wavelet-based description, the subject of the next section. 

3. WAVELETS AND \/P NOISE 

One may regard a time series with N points as a vector in 
an Af-dimensional space that is spanned by N orthonormal unit 
vectors, one for each time index (the "time basis"). The com- 
putational difficulty with correlated noise is that the sample 
covariance matrix £ is not diagonal in the time basis, nor is it 
necessarily close to being diagonal in realistic cases. This mo- 
tivates a search for some alternative basis spanning the data 
space for which the covariance matrix is diagonal or nearly 
diagonal. For example, if the noise took the form of additive 
quasiperiodic signals, it would be logical to work in a Fourier 
basis instead of the time basis. 

The mathematical result that underpins our analysis algo- 
rithm is that in the presence of 1 // 7 noise, the covariance 
matrix is nearly diagonal in a suitable wavelet basis. Before 
giving the details of the algorithm we will briefly review the 
wavelet transform. Our discussion is drawn primarily from 
Wornell (1996), Teolis (1998), Daubechies (1988), and Mal- 
lat (1999). Practical details and an sample implementation of 
the wavelet transform are given by Press et al. (2007). 

A wavelet is a function that is analogous to the sine and 
cosine functions of the Fourier transform. Some properties 
that wavelets share with sines and cosines are that they are 
localized in frequency space, and they come in families that 
are related by translations and dilations. Wavelets are unlike 
sine and cosine functions in that wavelets are strongly local- 
ized in time. A wavelet basis is derived from a single "mother 
wavelet" ip(t), which may have a variety of functional forms 
and analytic properties. The individual basis functions are 
formed through translations and dilations of ip(t). The choice 
of mother wavelet depends on the specific application. We re- 
strict our focus to dyadic orthogonal wavelet bases with basis 
functions 

C(0 = ^(2'"f-«) (12) 

for all integers m and n, and we further require tjj(t) to have 
one or more vanishing moments. 2 In this case, the pair of 
equations analogous to the Fourier series and its inversion is 

00 00 

e ( f ) = E E e «^"« < 13 > 

m=— 00 n=— 00 

2 In particular it is required that the mother wavelet %j)(t) has zero mean. 
This is a necessary and sufficient condition to ensure the invertibility of the 
wavelet transform. 
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FIG. 1. — Examples of l// 7 noise. Uncorrelated (white) noise corresponds 
corresponds to 7 = 2. These time series were generated using the wavelet-based 

/oo 
eW™(t)dt (14) 
00 

where e™ is referred to as the wavelet coefficient of e(t) at 
resolution m and translation n. 

3.1. The wavelet transform as a multire solution analysis 

We will see shortly that some extra terms are required in 
Eqn. ( fl4l i for real signals with some minimum and maximum 
resolution. To explain those terms it is useful to describe the 
wavelet transform as a multiresolution analysis, in which we 
consider successively higher-resolution approximations of a 
signal. An approximation with a resolution of 2'" samples per 
unit time is a member of a resolution space V m . Following 
Wornell (1996) we impose the following conditions: 

1. if f(t) £ V m then for some integer n, f(t-2~ m n) E V m 

2. if/a)GV m then/(20GV m+ i. 

The first condition requires that V m contain all translations (at 
the resolution scale) of any of its members, and the second 
condition ensures that the sequence of resolutions is nested: 
V m is a subset of the next finer resolution V m +\ . In this way, if 
e m (t ) 6 V m is an approximation to the signal e(f ), then the next 
finer approxmation e m +i(t) £ V m +\ contains all the information 
encoded in e m (t ) plus some additional detail d,„(t) defined as 

d m (t) = e m+1 (t)-e m (t). (15) 

We may therefore build an approximation at resolution M by 
starting from some coarser resolution k and adding successive 
detail functions: 

M 

e M (t) = e k (t) + J2d m (t) (16) 

m=k 

The detail functions d m (t) belong to a function space W m (t), 
the orthogonal complement of the resolution V m . 

With these conditions and definitions, the orthogonal ba- 
sis functions of W m are the wavelet functions ijj™(t), ob- 
tained by translating and dilating some mother wavelet ip(t). 



to 7 = 0. "Pink" noise corresponds to 7 = 1. "Red" noise or Brownian motion 
method described in §|4] 

The orthogonal basis functions of V m are denoted <j>'"(t), ob- 
tained by translating and dilating a so-called "father" wavelet 
4>(t). Thus, the mother wavelet spawns the basis of the detail 
spaces, and the father wavelet spawns the basis of the resolu- 
tion spaces. They have complementary characteristics, with 
the mother acting as a high-pass filter and the father acting as 
a low-pass filter. 

In Eqn. (I161 l. the approximation eit(t) is a member of Vi, 
which is spanned by the functions 4> k „{t), and d m (t) is a mem- 
ber of W m , which is spanned by the functions ip"(t). Thus we 
may rewrite Eqn. ( fTBI l as 

00 M 00 

e M (t)= ]T eM(0 + ]T (17) 

«=-oo m=k n=—oo 

The wavelet coefficients e™ and the scaling coefficients e" 1 are 
given by 

/oo 
e(W:(t)dt (18) 
OO 

/OO 
e{i)4%{t)dt (19) 
00 

Eqn. ( fTTI i reduces to the wavelet-only equation (Tl3T > for the 
case of a continuously sampled signal e(t), when we have ac- 
cess to all resolutions m from -00 to 00. 4 

There are many suitable choices for cf> and ip, differing in 
the tradeoff that must be made between smoothness and lo- 
calization. The simplest choice is due to Haar (1910): 

6(t)= i l if0<? ^ 1 (20) 
^ ' 1 otherwise ' ^ ' 

( 1 if-| <t <0 
ip(t)=\-l if < r < i (21) 
I otherwise 

3 More precisely, the wavelet and scaling functions considered here are 
"quadrature mirror filters" (Mallat 1999). 

4 The signal must also be bounded in order for the approximation to the 
signal at infinitely coarse resolution to vanish, i.e., limt—,-^ e^t) = 0. 
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The left panel of Fig.[2]shows several elements of the approx- 
imation and detail bases for a Haar multiresolution analysis. 
The left panels of Fig.[3]illustrate a Haar multiresolution anal- 
ysis for an arbitrarily chosen signal e(f), by plotting both the 
approximations e,„(f) and details d m (t) at several resolutions 
to. The Haar analysis is shown for pedagogic purposes only. 
In practice we found it advantageous to use the more compli- 
cated fourth-order Daubechies wavelet basis, described in the 
next section, for which the elements and the multiresolution 
analysis are illustrated in the right panels of Fig. |2][3] 

3.2. The Discrete Wavelet Transform 

Real signals are limited in resolution, leading to finite M 
and k in Eqn. ( fTTI i. They are also limited in time, allowing 
only a finite number of translations N m at a given resolution 
m. Starting from Eqn. (fTTI i. we truncate the sum over n and 
reindex the resolution sum such that the coarsest resolution is 
k—\, giving 

Ni M N„ 

m=1 m=2 n=l 

where we have taken t = to be the start of the signal. Since 
there is no information on timescales smaller than 2" M , we 
need only consider £«(?,) at a finite set of times ?,■: 

Ni M N,„ 

e(td = J2 ^Jfe) + ££ (23) 

n=l m=2 n=l 

Eqn. ( 1231 is the inverse of the Discrete Wavelet Transform 
(DWT). Unlike the continuous transform of Eqn. dot , the 
DWT must include the coarsest level approximation (the first 
term in the preceding equation) in order to preserve all the 
information in e(f,). For the Haar wavelet, the coarsest ap- 
proximation is the mean value. For data sets with N = no2 M 
uniformly spaced samples in time, we will have access to a 
maximal scale M, as in Eqn. d23l l. with N m = hq2 . 

A crucial point is the availability of the Fast Wavelet Trans- 
form (FWT) to perform the DWT (Mallat 1989). The FWT is 
a pyramidal algorithm operating on data sets of size N = «o2 M 
returning «o(2 M - 1) wavelet coefficients and «o scaling coef- 
ficients for some «o > 0, M > 0. The FWT is a computation- 
ally efficient algorithm that is easily implemented (Press et al. 
2007) and has 0(N) time-complexity (Teolis 1998). 

Daubechies (1988) generalized the Haar wavelet into a 
larger family of wavelets, categorized according to the num- 
ber of vanishing moments of the mother wavelet. The Haar 
wavelet has a single vanishing moment and is the first member 
of the family. In this work we used the most compact mem- 
ber (in time and frequency), %\)=i,T> and <j> =4 A, which is well 
suited to the analysis of 1 // 7 noise for < 7 < 4 (Wornell 
1996). We plot and in the time-domain for several n, 
to in Fig. [2] illustrating the rather unusual functional form of 
4D. The right panel of Fig. [3] demonstrates a multiresolution 
analysis using this basis. Press et al. (2007) provide code to 
implement the wavelet transform in this basis. 

3.3. Wavelet transforms and 1 // 7 noise 

As alluded in §[3] the wavelet transform acts as a nearly di- 
agonalizing operator for the covariance matrix in the presence 
of l// 7 noise. The wavelet coefficients e'" of such a noise 
process are zero-mean, nearly uncorrected random variables. 
Specifically, the covariance between scales to, to' and transla- 
tions n, n' is (Wornell 1996, p. 65) 

(CCX^^jW^. (24) 



The wavelet basis is also convenient for the case in which 
the noise is modeled as the sum of an uncorrected component 
and a correlated component, 

e(f) = e (f) + e 7 (0, (25) 

where eo(f ) is a Gaussian white noise process (7 = 0) with a 
single noise parameter a w , and e 7 (f) has S{ f) = A/f 1 . In the 
context of transit photometry, white noise might arise from 
photon-counting statistics (and in cases where the detector is 
well-calibrated, cr„. is a known constant), while the 7 ^ term 
represents the "rumble" on many time scales due to instru- 
mental, atmospheric, or astrophysical sources. For the noise 
process of Eqn. (|25T > the covariance between wavelet coeffi- 
cients is 

(£e# ) « (a r 2 2- 7 '« + al) 5„ hm , 5 n ,„, . (26) 
and the covariance betwen the scaling coefficients e" 1 is 

0*2-^(7) + o* (27) 

where g(j) is a constant of order unity; for the purposes of 
this work g(l) = (21n2) _1 » 0.72 (Fadili & Bullmore 2002). 
Eqns. (|26| > and (|27| > are the key mathematical results that form 
the foundation of our algorithm. For proofs and further de- 
tails, see Wornell (1996). 

It should be noted that the correlations between the wavelet 
and scaling coefficients are small but not exactly zero. The de- 
cay rate of the correlations with the resolution index depends 
on the choice of wavelet basis and on the spectral index 7. 
By picking a wavelet basis with a higher number of vanishing 
moments, we hasten the decay of correlations. This is why 
we chose the Daubechies 4th-order basis instead of the Haar 
basis. In the numerical experiments decribed in § 4, we found 
the covariances to be negligible for the purposes of parameter 
estimation. In addition, the compactness of the Daubechies 
4th-order basis reduces artifacts arising from the assumption 
of a periodic signal that is implicit in the FWT. 

3.4. The whitening filter 

Given an observation of noise e(f) that is modeled as in 
Eqn. ( f25l l, we may estimate the 7 ^ component by rescal- 
ing the wavelet and scaling coefficients and filtering out the 
white component: 

M N * / a 2 ? -~fm \ 
m=2 n=\ v r 

We may then proceed to subtract the estimate of the corre- 
lated component from the observed noise, eo(f) = e(t)-e 7 (t) 
(Wornell 1996, p. 76). In this way the FWT can be used to 
"whiten" the noise. 

3.5. The wavelet-based likelihood 

Armed with the preceding theory, we rewrite the likelihood 
function of Eqn. (O in the wavelet domain. First we transform 
the residuals r,- = y, — /(?,-; p), giving 

c=y:-f:(p)=ei„+^„ oo) 

r l n=t-fn(P) = A,n + An (3D 

where y'" and f"'(p) are the discrete wavelet coefficients of 
the data and the model. Likewise, y l n and flip) are the no 
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scaling coefficients of the data and the model. Given the di- 
agonal covariance matrix shown in Eqns. ( 126b and d27t . the 
likelihood £ is a product of Gaussian functions at each scale 
m and translation n: 




exp 



for) 

2<4 



2a\ 



where 



(32) 



(33) 
(34) 



are the variances of the wavelet and scaling coefficients re- 
spectively. For a data set with points, calculating the like- 
lihood function of Eqn. (f32l > requires multiplying Gaussian 
functions. The additional step of computing the FWT of the 



residuals prior to computing L adds 0(N) operations. Thus, 
the entire calculation has a time-complexity 0(N). 

For this calculation we must have estimators of the three 
noise parameters 7, o> and a w . These may be estimated sep- 
arately from the model parameters p, or simultaneously with 
the model parameters. For example, in transit photometry, the 
data obtained outside of the transit may be used to estimate 
the noise parameters, which are then used in Eqn. (f32t to esti- 
mate the model parameters. Or, in a single step we could max- 
imize Eqn. ( l32b with respect to all of 7, o>, cr„ and p. Fitting 
for both noise and transit parameters simultaneously is poten- 
tially problematic, because some of the correlated noise may 
be "absorbed" into the choices of the transit parameters, i.e., 
the errors in the noise parameters and transit parameters are 
themselves correlated. This may cause the noise level and the 
parameter uncertainties to be underestimated. Unfortunately, 
there are many instances when one does not have enough out- 
of-transit data for the strict separation of transit and noise pa- 
rameters to be feasible. 

In practice the optimization can be accomplished with an 



7 



iterative routine [such as AMOEBA, Powell's method, or a 
conjugate-gradient method; see Press et al. (2007)]. Confi- 
dence intervals can then be defined by the contours of constant 
likelihood. Alternatively one can use a Monte Carlo Markov 
Chain [MCMC; see, e.g., Gregory (2005)], in which case the 
jump-transition likelihood would be given by Eqn. (l32l . The 
advantages of the MCMC method have led to its adoption by 
many investigators (see, e.g., Holman et al. 2006, Burke et 
al. 2007, Collier Cameron et al. 2007). For that method, com- 
putational speed is often a limiting factor, as a typical MCMC 
analysis involves several million calculations of the likelihood 
function. 

3.6. Some practical considerations 

Some aspects of real data do not fit perfectly into the 
requirements of the DWT. The time sampling of the data 
should be approximately uniform, so that the resolution scales 
of the multiresolution analysis accurately reflect physical 
timescales. This is usually the case for time-series photomet- 
ric data. Gaps in a time series can be fixed by applying the 
DWT to each uninterrupted data segment, or by filling in the 
missing elements of the residual series with zeros. 

The FWT expects the number of data points to be an inte- 
gral multiple of some integral power of two. When this is not 
the case, the time series may be truncated to the nearest such 
boundary; or it may be extended using a periodic boundary 
condition, mirror reflection, or zero-padding. In the numeri- 
cal experiments described below, we found that zero-padding 
has negligible effects on the calculation of likelihood ratios 
and parameter estimation. 

The FWT generally assumes a periodic boundary condition 
for simplicity of computation. A side effect of this simplica- 
tion is that information at the beginning and end of a time se- 
ries are artificially associated in the wavelet transform. This 
is one reason why we chose the 4th-order Daubechies-class 
wavelet basis, which is well localized in time, and does not 
significantly couple the beginning and the end of the time se- 
ries except on the coarsest scales. 

4. NUMERICAL EXPERIMENTS WITH TRANSIT LIGHT CURVES 

We performed many numerical experiments to illustrate and 
test the wavelet method. These experiments involved estimat- 
ing the parameters of simulated transit light curves. We also 
compared the wavelet analysis to a "white" analysis, by which 
we mean a method that assumes the errors to be uncorrected, 
and to two other analysis methods drawn from the literature. 
Because we used simulated transit light curves with known 
noise and transit parameters, the "truth" was known precisely, 
allowing both the absolute and relative merits of the methods 
to be evaluated. 

4.1. Estimating the midtransit time: Known noise parameters 

In this section we consider the case in which the noise pa- 
rameters 7, o>, and a w are known with negligible error. We 
have in mind a situation in which a long series of out-of-transit 
data are available, with stationary noise properties. 

We generated transit light curves with known transit pa- 
rameters p, contaminated by an additive combination of a 
white and a correlated (1 // 7 ) noise source. Then we used an 
MCMC method to estimate the transit parameters and their 
68.3% confidence limits. (The technique for generating noise 
and the MCMC method are described in detail below.) For 
each realization of a simulated light curve, we estimated tran- 



sit parameters using the likelihood defined either by Eqn. © 
for the white analysis, or Eqn. (l32b for the wavelet analysis. 

For a given parameter p k , the estimator p k was taken to be 
the median of the values in the Markov chain and a Pk was 
taken to be the standard deviation of those values. To assess 
the results, we considered the "number-of-sigma" statistic 

Af=(p k -p k )/a Ph . (35) 

In words, Af is the number of standard deviations separating 
the parameter estimate p% from the true value p^. If the error 
in pt is Gaussian, then a perfect analysis method should yield 
results for Af with an expectation value of and variance of 
1 . If we find that the variance of Af is greater than one, then 
we have underestimated the error in p k and we may attribute 
too much significance to the result. On the other hand, if the 
variance of Af is smaller than one, then we have overestimated 
a Pk and we may miss a significant discovery. If we find that 
the mean of Af is nonzero then the method is biased. 

For now, we consider only the single parameter t c , the time 
of midtransit. The t c parameter is convenient for this analy- 
sis as it is nearly decoupled from the other transit parameters 
(Carter et al. 2008). Furthermore, as mentioned in the intro- 
duction, the measurement of the midtransit time cannot be 
improved by observing other transit events, and variations in 
the transit interval are possible signs of additional gravitating 
bodies in a planetary system. 

The noise was synthesized as follows. First, we generated 
a sequence of N = 1024 independent random variables obey- 
ing the variance conditions from Eqns. d26b and (f27b for 1023 
wavelet coefficients over 9 scales and a single scaling coeffi- 
cient at the coarsest resolution scale. We then performed the 
inverse FWT of this sequence to generate our noise signal. In 
this way, we could select exact values for 7, oy, and er w , We 
also needed to find the single parameter a for the white-noise 
analysis; it is not simply related to the parameters 7, o>, and 
a w . In practice, we found a by calculating the median sample 
variance among 10 4 unique realizations of a noise source with 
fixed parameters 7, oy, and a w . 

For the transit model, we used the analytic formulas of 
Mandel & Agol (2002), with a planet-to-star ratio of R p /R+ = 
0.15, a normalized orbital distance of a/R± = 10, and an or- 
bital inclination of i = 90°, as appropriate for a gas giant planet 
in a close-in orbit around a K star. These correspond to a 
fractional loss of light S = 0.0225, duration T = 1.68 hr, and 
partial duration r = 0.152 hr. We did not include the effect 
of limb darkening, as it would increase the computation time 
and has little influence on the determination of t c (Carter et 
al. 2009). Each simulated light curve spanned 3 hr centered 
on the midtransit time, with a time sampling of 1 1 s, giving 
1024 uniformly spaced samples. A noise-free light curve is 
shown in Fig. [4] 

For the noise model, we chose <j K = 1.35 x 10~ 3 and 7 = 1, 
and tried different choices for o>. We denote by a the ratio 
of the rms values of the correlated noise component and the 
white noise component. 5 The example in Fig. [4] has a = 1/3. 
As a is increased from zero, the correlated component be- 
comes more important, as is evident in the simulated data 
plotted in Fig. [5] Our choice of a w corresponds to a preci- 
sion of 5.8 x 10 -4 per minute-equivalent sample, and was in- 
spired by the recent work by Johnson et al. (2009) and Winn 

5 We note that although cr w is the rms of the white noise component, cr r is 
generally not the rms of the correlated component. The notation is unfortu- 
nate, but follows that of Womell (1996). 
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FIG. 4. — Constructing a simulated transit light curve with correlated noise. The total noise is the sum of uncorrected Gaussian noise with standard deviation 
cr w (upper left panel) and correlated noise with a power spectral density S(f) oc 1// and an rms equal to <t„/3 (upper right panel). The total noise (middle left 
panel) is added to an idealized transit model (middle right panel) to produce the simulated data (bottom panel). 



et al. (2009), which achieved precisions of 5.4 x 10 -4 and 
4.0 x 10~ 4 per minute-equivalent sample, respectively. Based 
on our survey of the literature and our experience with the 
Transit Light Curve project (Holman et al. 2006, Winn et 
al. 2007), we submit that all of the examples shown in Fig. [5] 
are "realistic" in the sense that the bumps, wiggles, and ramps 
resemble features in actual light curves, depending on the in- 
strument, observing site, weather conditions, and target star. 

For a given choice of a, we made 10,000 realizations of the 
simulated transit light curve with 1// noise. We then con- 
structed two Monte Carlo Markov Chains for t c starting at 
the true value of t c = 0. One chain was for the white analy- 
sis, with a jump-transition likelihood given by Eqn. ©. The 
other chain was for the wavelet analysis, using Eqn. ( 1321 ) in- 
stead. Both chains used the Metropolis-Hastings jump con- 
dition, and employed perturbation sizes such that «40% of 
jumps were accepted. Initial numerical experiments showed 
that the autocorrelation of a given Markov chain for t c is 
sharply peaked at zero lag, with the autocorrelation dropping 
below 0.2 at lag-one. This ensured good convergence with 



chain lengths of 500 (Tegmark et al. 2004). Chain histograms 
were also inspected visually to verify that the distribution was 
smooth. We recorded the median t c and standard deviation a,.. 
for each chain and constructed the statistic Af for each sepa- 
rate analysis (white or wavelet). Finally, we found the median 
and standard deviation of Af over all 10,000 noise realizations. 

Fig. [6] shows the resulting distributions of Af, for the par- 
ticular case a = 1/3. Table [JJ gives a collection of results for 
the choices a = 0, 1/3, 2/3, and 1. The mean of Af is zero 
for both the white and wavelet analyses: neither method is 
biased. This is expected, because all noise sources were de- 
scribed by zero-mean Gaussian distributions. However, the 
widths of the distributions of Af show that the white analy- 
sis underestimates the error in t c . For a transit light curve 
constructed with equal parts white and 1// noise (a = 1), the 
white analysis gave an estimate of t c that differs from the true 
value by more than 1 a nearly 80% of the time. The factor by 
which the white analysis underestimates the error in t c appears 
to increase linearly with a. In contrast, for all values of a, the 
wavelet analysis maintains a unit variance in Af, as desired. 
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FIG. 5. — Examples of simulated transit light curves with different ratios a = rms r /rms„ between the rms values of the correlated noise component and white 
noise component. 



TABLE 1 

Estimates of mid-transit time, t c , from data with known noise 

PROPERTIES 



Method a (& tc ) [sec] (AO ctjv prob(N > 1) prc*(best) a 



White 4.1 +0.004 0.95 29% 50% 

1/3 4.3 -0.005 1.93 61% 39% 

2/3 5.0 +0.005 3.04 75% 35% 

1 5.9 -0.036 3.82 79% 34% 



Wavelet 4.0 +0.005 0.95 29% 50% 

1/3 7.2 -0.004 0.93 28% 61% 

2/3 11.5 -0.004 0.94 28% 65% 

1 16.0 -0.001 0.95 29% 66% 



a The probability that the analysis method (white or wavelet) returns an esti- 
mate of t c that is closer to the true value than the other method. 



The success of the wavelet method is partially attributed to 
the larger (and more appropriate) error intervals that it returns 
for i c . It is also partly attributable to an improvement in the 
accuracy of i c itself: the wavelet method tends to produce i c 
values that are closer to the true t c . This is shown in the final 
column in Table (Q~|i, where we report the percentage of cases 
in which the analysis method (white or wavelet) produces an 
estimate of t c that is closer to the truth. For a = 1 the wavelet 
analysis gives more accurate results 66% of the time. 



4.2. Estimating the midtransit time: Unknown noise 
parameters 

In this section we consider the case in which the noise pa- 
rameters are not known in advance. Instead the noise param- 
eters must be estimated based on the data. We did this by in- 
cluding the noise parameters as adjustable parameters in the 
Markov chains. In principle this could be done for all three 
noise parameters 7, cr r , and a w , but for most of the experi- 
ments presented here we restricted the problem to the case 
7=1. This may be a reasonable simplification, given the pre- 
ponderance of natural noise sources with 7=1 (Press 1978). 
Some experiments involving noise with 7 ^ 1 are described at 
the end of this section. 

We also synthesized the noise with a non-wavelet tech- 
nique, to avoid "stacking the deck" in favor of the wavelet 
method. We generated the noise in the frequency domain, as 
follows. We specified the amplitudes of the Fourier coeffi- 
cients using the assumed functional form of the power spec- 
tral density [S(f) oc 1 //], and drew the phases from a uniform 
distribution between -it and tt. The correlated noise in the 
time domain was found by performing an inverse Fast Fourier 
Transform. We rescaled the noise such that the rms was a 
times the specified a w . The normally-distributed white noise 
was then added to the correlated noise to create the total noise. 
This in turn was added to the idealized transit model. 

For each choice of a, we made 10,000 simulated transit 
light curves and analyzed them with the MCMC method de- 
scribed previously. For the white analysis, the mid-transit 
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TABLE 2 

Effect of time sampling on the white analysis 





Cadence [sec] 




256 


42.2 


1.72 


512 


21.1 


2.04 


1024 


10.5 


2.69 


2048 


5.27 


3.49 


4096 


2.63 


4.39 



FIG. 6. — Histograms of the number-of-sigma statistic J\f for the midtransit time t c . Each distribution shows the probability of estimating a value for t c that 
differs by Ma from the true value. The simulated data were created by adding an idealized transit model to a noise source that is the sum of uncorrelated noise 
and 1// noise with equal variances (a = 1; see the text). 

time t c and the single noise parameter a were estimated using 
the likelihood defined via Eqn. ©. For the wavelet analysis 
we estimated t c and the two noise parameters o> and <j w using 
the likelihood defined in Eqn. ( f32b . 

Table[3]gives the resulting statistics from this experiment, in 
the same form as were given in Table Q]for the case of known 
noise parameters. (This table also includes some results from 
§ 14.41 which examines two other methods for coping with cor- 
related noise.) Again we find that the wavelet method pro- 
duces a distribution of Af with unit variance, regardless of 
a; and again, we find that the white analysis underestimates 
the error in t c . In this case the degree of error underestima- 
tion is less severe, a consequence of the additional freedom 
in the noise model to estimate a from the data. The wavelet 
method also gives more accurate estimates of t c than the white 
method, although the contrast between the two methods is 
smaller than it was with for the case of known noise parame- 
ters. 

Our numerical results must be understood to be illustrative, 
and not universal. They are specific to our choices for the 
noise parameters and transit parameters. Via further numer- 
ical experiments, we found that the width of Af in the white 
analysis is independent of a w , but it does depend on the time 
sampling. In particular, the width grows larger as the time 
sampling becomes finer (see Tabled. This can be understood 
as a consequence of the long-range correlations. The white 
analysis assumes that the increased number of data points will 
lead to enhanced precision, whereas in reality, the correlations 
negate the benefit of finer time sampling. 

Table|4]gives the results of additional experiments with 7 / 
1. In those cases we created simulated noise with 7 y 1 but in 
the course of the analysis we assumed 7=1. The correlated 
noise fraction was set to a = 1/2 for these tests. The results 
show that even when 7 is falsely assumed to be unity, the 
wavelet analysis still produces better estimates of t c and more 
reliable error bars than the white analysis. 



a The number of samples in a 3 hr interval. 

4.3. Runtime analysis of the time-domain method 

Having established the superiority of the wavelet method 
over the white method, we wish to show that the wavelet 
method is also preferable to the more straightforward ap- 
proach of computing the likelihood function in the time do- 
main with a non-diagonal covariance matrix. The likelihood 
in this case is given by Eqn. ©. 

The time-domain calculation and the use of the covariance 
matrix raised two questions. First, how well can we estimate 
the autocovariance R(t) from a single time series? Second, 
how much content of the resulting covariance matrix needs to 
be retained in the likelihood calculation for reliable parame- 
ter estimation? The answer to the first question depends on 
whether we wish to utilize the sample autocorrelation as the 
estimator of R(t) or instead use a parametric model (such as 
an ARMA model) for the autocorrelation. In either case, our 
ability to estimate the autocorrelation improves with number 
of data samples contributing to its calculation. The second 
question is important because retaining the full covariance 
matrix would cause the computation time to scale as 0(N 2 ) 
and in many cases the analysis would be prohibitively slow. 
The second question may be reframed as: what is the mini- 
mum number of lags L that needs to be considered in comput- 
ing the truncated x 2 of Eqn. (O, in order to give unit variance 
in the number-of-sigma statistic for each model parameter? 
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TABLE 3 

Estimates of t c from data with unknown noise properties 



Method 


a 


(a, c ) [sec] 


(AO 




probiM > 1) 


/>rofo(better) a 


White 





4.0 


-0.011 


0.97 


31% 






1/3 


4.2 


+0.010 


1.70 


57% 






2/3 


4.9 


+0.012 


2.69 


73% 






1 


5.8 


+0.023 


3.28 


78% 





Wavelet 4.5 -0.009 0.90 26% 50% 

1/3 6.9 -0.003 1.03 33% 56% 

2/3 11.2 -0.005 1.07 35% 57% 

1 15.7 -0.007 1.09 36% 57% 



Time-averaging 4.4 -0.006 0.88 26% 50% 

1/3 6.8 +0.009 1.15 36% 50% 

2/3 11.6 -0.012 1.24 40% 50% 

1 17.6 +0.007 1.21 38% 50% 



Residual-permutation 3.5 -0.012 1.16 37% 50% 

1/3 6.6 +0.013 1.24 37% 50% 

2/3 11.8 -0.014 1.28 38% 49% 

1 17.3 +0.008 1.30 38% 48% 



a The probability that the analysis method returns an estimate of t c that is closer to the true value than the white analysis. 



TABLE 4 

Estimates of t c from data with unknown noise properties 



Method 7 a (a, c ) [sec] (AO cr^f probiM > 1) profc(best) b 



White 0.5 4.5 -0.025 1.34 47% 50% 
1.5 4.6 +0.020 3.10 77% 32% 



Wavelet 0.5 6.7 -0.021 0.97 30% 50% 

1.5 6.9 +0.002 1.17 39% 68% 



lihood. We estimated t c and a tc , and calculated JV. We did 
this for 5 , 000 realizations and determined a^, the variance in 
TV", across this sample. We repeated this process for different 
choices of the maximum lag L. Fig. © shows the dependence 
of a^f upon the maximum lag L. 

The time-domain method works fine, in the sense that when 
enough non-diagonal elements in the covariance matrix are 
retained, the parameter estimation is successful. We find that 
ajsf approaches unity as L' 13 with j3 = 0.15, 0.25 for a = 1/3, 
2/3, respectively. However, to match the reliability of the 
wavelet method, a large number of lags must be retained. To 
reach aj^ = 1.05, we need L m 50 for a = 1/3 or L w 75 
for a = 2/3. In our implementation, the calculation based on 
the truncated covariance matrix [Eqn. (0] took 30^+0 times 
longer than the calculation based on the wavelet likelihood 
[Eqn. ([3l]. 

This order-of-magnitude penalty in runtime is bad enough, 
but the real situation may be even worse, because one usually 
has access to a single noisy estimate of the autocovariance 
matrix. Or, if one is using an ARMA model, the estimated pa- 
rameters of the model might be subject to considerable uncer- 
tainty as compared to the "exact" autocovariance employed in 
our numerical experiments. If it is desired to determine the 
noise parameters simultaneously with the other model param- 
eters, then there is a further penalty associated with inverting 
the covariance matrix at each step of the calculation for use 
in Eqn. (0, although it may be possible to circumvent that 
particular problem by modeling the inverse-covariance matrix 
directly. 

4.4. Comparison with other methods 

In this section we compare the results of the wavelet method 
to two methods for coping with correlated noise that are 
drawn from the recent literature on transit photometry. The 
first of these two methods is the "time averaging" method 
that was propounded by Pont et al. (2006) and used in vari- 
ous forms by Bakos et al. (2006), Gillon et al. (2006), Winn 
et al. (2007, 2008, 2009), Gibson et al. (2008), and others. In 



a The spectral exponent of the Power Spectral Density, Sif) <x l// 7 . b The 
probability that the analysis method (white or wavelet) returns an estimate of 
t c that is closer to the true value than the other method. 

The time-complexity of the truncated likelihood calculation 
is 0(NL). If L < 5 then the time-domain method and the 
wavelet method may have comparable computational time- 
complexity, while for larger L the wavelet method would offer 
significant advantage. 

We addressed these questions by repeating the experiments 
of the previous sections using a likelihood function based on 
the truncated \ 2 statistic. We assum ed th at the parameters of 
the noise model were known, as in § 14.11 The noise was syn- 
thesized in the wavelet domain, with 7=1, a w = 0.00135, and 
a set equal to 1/3 or 2/3. The parameters of the transit model 
and the time series were the same as in § 14.11 We calculated 
the "exact" autocovariance function R(l) at integer lag I for 
a given a by averaging sample autocovariances over 50,000 
noise realizations. Fig.[7]plots the autocorrelation [R(l)/R(0)] 
as a function of lag for a = 1/3, 2/3. We constructed the 
stationary covariance Ey =R(\i—j\) and computed its inverse 
(S _1 )y for use in Eqn. ©. 

Then we used the MCMC method to find estimates and er- 
rors for the time of midtransit, and calculated the number-of- 
sigma statistic J\f as defined in Eqn. (f33T >. In particular, for 
each simulated transit light curve, we created a Markov chain 
of 1,000 links for t c , using y 2 (L) in the jump-transition like- 



12 




FIG. 7. — Autocorrelation functions of correlated noise. The noise was 
computed as the sum of white noise with cr„ = 0.00135 and 1 // noise with 
an rms equal to aa w , for a = l/3or2/3. 
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FIG. 8. — Accuracy of the truncated time-domain likelihood in estimating 
midtransit times. Plotted is the variance in the number-of-sigma statistic crj\f 
for the midtransit time t c , as a function of the maximum lag in the truncated 
series. The estimates of t c were found using the truncated likelihood given in 
Eqn. ®. 

one implementation, the basic idea is to calculate the sample 
variance of unbinned residuals, o\ , and also the sample vari- 
ance of the time-averaged residuals, a 2 n , where every n points 
have been averaged (creating m time bins). In the absence of 
correlated noise, we expect 



eC = - 



m 



m— 1 



(36) 



In the presence of correlated noise, <j\ differs from this ex- 
pectation by a factor p^. The estimator (3 is then found by 

averaging j3 n over a range An corresponding to time scales 
that are judged to be most important. In the case of transit 
photometry, the duration of ingress or egress is the most rel- 
evant time scale (corresponding to averaging time scales on 
the order of tens of minutes, in our example light curve). A 
white analysis is then performed, using the noise parameter 



a = (3a\ instead of <j\ . This causes the parameter errors a Pk to 
increase by (3 but does not change the parameter estimates pt 
themselves. 6 

A second method is the "residual permutation" method that 
has been used by Jenkins et al. (2002), Moutou et al. (2004), 
Southworth (2008), Bean et al. (2008), Winn et al. (2008), 
and others. This method is a variant of a bootstrap analysis, 
in which the posterior probability distribution for the parame- 
ters is based on the collection of results of minimizing \ 2 (as- 
suming white noise) for a large number of synthetic data sets. 
In the traditional bootstrap analysis the synthetic data sets are 
produced by scrambling the residuals and adding them to a 
model light curve, or by drawing data points at random (with 
replacement) to make a simulated data set with the same num- 
ber of points as the actual data set. In the residual permuta- 
tion method, the synthetic data sets are built by performing 
a cyclic permutation of the time indices of the residuals, and 
then adding them to the model light curve. In this way, the 
synthetic data sets have the same bumps, wiggles, and ramps 
as the actual data, but they are translated in time. The pa- 
rameter errors are given by the widths of the distributions in 
the parameters that are estimated from all the different real- 
izations of the synthetic data, and they are usually larger than 
the parameter errors returned by a purely white analysis. 

As before, we limited the scope of the comparison to the 
estimation of t c and its uncertainty. We created 5,000 realiza- 
tions of a noise source with 7=1 and a given value of a (either 
0, 1/3, 2/3, or 1). We used each of the two approximate meth- 
ods (time-averaging and residual-permutation) to calculate (3 
and its uncertainty based on each of the 5,000 noise realiza- 
tions. Then we found the median and standard deviation of 
(3/ (3 over all 5,000 realizations. Table (fj) presents the results 
of this experiment. 

Both methods, time-averaging and residual-permutation, 
gave more reliable uncertainties than the white method. How- 
ever they both underestimated the true uncertainties by ap- 
proximately 15-30%. Furthermore, neither method provided 
more accurate estimates of t c than did the white method. For 
the time-averaging method as we have implemented it, this 
result is not surprising, for that method differs from the white 
method only in the inflation of the error bars by some factor 
(3. The parameter values that maximize the likelihood func- 
tion were unchanged. 

4.5. Alternative noise models 

We have shown the wavelet method to work well in the 
presence of 1 / f 1 noise. Although this family of noise pro- 
cesses encompasses a wide range of possibilities, the universe 
of possible correlated noise processes is much larger. In this 
section we test the wavelet method using simulated data that 
has correlated noise of a completely different character. In 
particular, we focus on a process with exclusively short-term 
correlations, described by one of the aforementioned autore- 
gressive moving-average (ARMA) class of parametric noise 
models. In this way we test our method on a noise process 
that is complementary to the longer-range correlations present 
in 1 /f 1 noise, and we also make contact between our method 
and the large body of statistical literature on ARMA models. 

6 Alternatively one may assign an error to each data point equal to the 
quadrature sum of the measurement error and an extra term ay (Pont et 
al. 2006). For cases in which the errors in the data points are all equal or 
nearly equal, these methods are equivalent. When the errors are not all the 
same, it is more appropriate to use the quadrature-sum approach of Pont et 
al. (2006). In this paper all our examples involve homogeneous errors. 
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For l// 7 noise we have shown that time-domain param- 
eter estimation techniques are slow. However, if the noise 
has exclusively short-range correlations, the autocorrelation 
function will decay with lag more rapidly than a power law, 
and the truncated-;^ 2 likelihood [Eqn. ©] may become com- 
putationally efficient. ARMA models provide a convenient 
analytic framework for parameterizing such processes. For a 
detailed review of ARMA models and their use in statistical 
inference, see Box & Jenkins (1976). Applications of ARMA 
models to astrophysical problems have been described by in 
Koen & Lombard (1993), Konig & Timmer (1997) and Tim- 
mer et al. (2000). 

To see how the wavelet method performs on data with short- 
range correlations we constructed synthetic transit data in 
which the noise is described by a single-parameter autoregres- 
sive [AR(1; ip)] model. An AR(1; ip) process e(f,) is defined 
by the recursive relation 



<t i ) = r](t i ) + if>e(ti- 1 ) 



(37) 



where r/(ti) is an uncorrected Gaussian process with width 
parameter a and ip is the sole autoregressive parameter. The 
autocorrelation 7(f) for an AR(1; ip) process is 

„2 



7 (/) = 



(38) 



An AR(1;?/') process is stationary so long as < ip < 1 (Box & 
Jenkins 1976). The decay length of the autocorrelation func- 
tion grows as ip is increased from zero to one. Figure (O plots 
the autocorrelation function of a process that is an additive 
combination of an AR(1; ip = 0.95) process and a white noise 
process. The noise in our synthetic transit light curves was the 
sum of this AR(1; ip = 0.95) process, and white noise, with 
a = 1/2 (see Fig |9). With these choices, the white method 
underestimates the error in t c , while at the same time the syn- 
thetic data look realistic. 

We proceeded with the MCMC method as described pre- 
viously to estimate the time of mid-transit. All four methods 
assessed in the previous section were included in this analysis, 
for comparison. Table[5]gives the results. The wavelet method 
produces more reliable error estimates than the white method. 
However, the wavelet method no longer stands out as supe- 
rior to the time-averaging method or the residual-permutation 
method; all three of these methods give similar results. This 
illustrates the broader point that using any of these methods 
is much better than ignoring the noise correlations. The re- 
sults also show that although the wavelet method is specif- 
ically tuned to deal with l// 7 noise, it is still useful in the 
presence of noise with shorter-range correlations. 

It is beyond the scope of this paper to test the applicabil- 
ity of the wavelet method on more general ARMA processes. 
Instead we suggest the following approach when confronted 
with real data [see also Beran (1994)]. Calculate the sample 
autocorrelation, and power spectral density, based on the out- 
of- transit data or the residuals to an optimized transit model. 
For stationary processes these two indicators are related as 
described in § [2] Short-memory, ARMA-like processes can 
be identified by large autocorrelations at small lags or by fi- 
nite power spectral density at zero frequency. Long-memory 
processes (1 // 7 ) can be identified by possibly small but non- 
vanishing autocorrelation at longer lags. Processes with short- 
range correlations could be analyzed with an ARMA model 
of the covariance matrix [see Box & Jenkins (1976)], or the 
truncated-lag covariance matrix, although a wavelet-based 
analysis may be sufficient as well. Long-memory processes 



are best analyzed with the wavelet method as described in this 
paper. 

It should also be noted that extensions of ARMA mod- 
els have been developed to mimic long-memory, l// 7 pro- 
cesses. In particular, fractional autoregressive integrated 
moving-average models (ARFIMA) describe "nearly" l// 7 
stationary processes, according to the criterion described by 
Beran (1994). As is the case with ARMA models, ARFIMA 
models enjoy analytic forms for the likelihood in the time- 
domain. Alas, as noted by Wornell (1996) and Beran (1994), 
the straightforward calculation of this likelihood is compu- 
tationally expensive and potentially unstable. For l//" 7 pro- 
cesses, the wavelet method is probably a better choice than 
any time-domain method for calculating the likelihood. 



AR(1;V> = 0.95) + whHe, a ^ 0.5 




FIG. 9. — An example of an autoregressive noise process with complemen- 
tary characteristics to a I// 7 process. The top panel shows the sum of an 
AR(1) process with if) = 0.95 and white noise. The correlated and uncorre- 
lated components have equal variances (a = 0.5). 



4.6. Transit timing variations estimated from a collection of 
light curves 

We present here an illustrative calculation that is relevant 
to the goal of detecting planets or satellites through the per- 
turbations they produce on the sequence of midtransit times 
of a known transiting planet. Typically an observer would fit 
the midtransit times f e ,, to a model in which the transits are 
strictly periodic: 



t cJ = t c . + EiP 



(39) 



for some integers Ej and constants t Ci o and P. Then, the residu- 
als would be computed by subtracting the best-fit model from 
the data, and a test for anomalies would be performed by as- 
sessing the likelihood of obtaining those residuals if the lin- 
ear model were correct. Assuming there are N data points 
with normally-distributed, independent errors, the likelihood 
is given by a x 2 -distribution, prob(x 2 , AWX where 



X 



t c ,i-{t c .Q + EiP) 



(40) 



and A^jot = N — 2 is the number of degrees of freedom. Values 
of x 2 with a low probability of occurrence indicate the linear 
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TABLE 5 

Estimates of t c from data with autoregressive correlated noise 



Method (a tc ) [sec] (Af) cr^ prob(M > 1) profo(better) a 

White 4.5 -0.010 2.50 70% 

Wavelet 8.7 -0.016 1.33 44% 51% 

Time-averaging 9.9 -0.010 1.25 40% 49% 

Residual-permutation 10.2 -0.010 1.23 38% 51% 



a The probability that the analysis method returns an estimate of t c that is closer to the true value than the white analysis. 



TABLE 6 

Linear fits to estimated midtransit times 



Method 


Fitted Period / True Period 


x7aw 


prob(x 2 < X 2 ) 


White 


1.00000071 ±0.00000043 


2.25 


98% 


Wavelet 


1 .00000048 ± 0.00000077 


0.93 


51% 



model is deficient, that there are significant anomalies in the 
timing data, and that further observations are warranted. 

We produced 10 simulated light curves of transits of the 
particular planet GJ 436b, a Neptune-sized planet transiting 
an M dwarf (Butler et al. 2004, Gillon et al. 2007) which has 
been the subject of several transit-timing studies (see, e.g., 
Ribas et al. 2008, Alonso et al. 2008, Coughlin et al. 2008). 
Our chosen parameters were R p /R+ = 0.084, a/R* = 12.25, 
i = 85.94 deg, and P = 2.644 d. This gives S = 0.007, T = 1 hr, 
and r = 0.24 hr. We chose limb-darkening parameters as ap- 
propriate for the SDSS r band (Claret 2004). We assumed that 
10 consecutive transits were observed, in each case giving 512 
uniformly-sampled flux measurements over 2.5 hours cen- 
tered on the tra nsit t ime. Noise was synthesized in the Fourier 
domain (as in § 14.21 . with a white component a K = 0.001 and a 
1// component with rms 0.0005 (a = 1 /2). The 10 simulated 
light curves are plotted in Fig. ( flOb . Visually, they resemble 
the best light curves that have been obtained for this system. 

To estimate the midtransit time of each simulated light 
curve, we performed a wavelet analysis and a white analysis, 
allowing only the midtransit time and the noise parameters to 
vary while fixing the other parameter values at their true val- 
ues. We used the same MCMC technique that was described 
in § 14.21 Each analysis method produced a collection of 10 
midtransit times and error bars. These 10 data points were 
then fitted to the linear model of Eqn. d39b . Fig. ( fTTT ) shows 
the residuals of the linear fit (observed - calculated). Table [6] 
gives the best-fit period for each analysis (wavelet or white), 
along with the associated values of \ 2 - 

As was expected from the results of § 14.21 the white analy- 
sis gave error bars that are too small, particularly for epochs 4 
and 7. As a result, the practitioner of the white analysis would 
have rejected the hypothesis of a constant orbital period with 
98% confidence. In addition, the white analysis gave an es- 
timate for the orbital period that is more than ler away from 
the true value, which might have complicated the planning 
and execution of future observations. The wavelet method, in 
contrast, neither underestimated nor overestimated the errors, 
giving x 2 ~ Nd f in excellent agreement with the hypothesis 
of a constant orbital period. The wavelet method also gave an 
estimate for the orbital period within lcr of the true value. 



4.7. Estimation of multiple parameters 

Thus far we have focused exclusively on the determination 
of the midtransit time, in the interest of simplicity. However, 
there is no obstacle to using the wavelet method to estimate 
multiple parameters, even when there are strong degeneracies 
among them. In this section we test and illustrate the ability of 
the wavelet method to solve for all the parameters of a transit 
light curve, along with the noise parameters. 

We modeled the transit as in §§ 14.11 and 14.21 The noise 
was synthesized in the frequency domain (as in § 14.21 ). using 
a w = 0.0045, 7=1, and a = 1/2. The resulting simulated light 
curve is the upper time series in Fig.[T2] We used the MCMC 
method to estimate the transit parameters {R p / 'R+,a/ R+,i,t c } 
and the noise parameters {a r ,a w } (again fixing 7=1 for sim- 
plicity). The likelihood was evaluated with either the wavelet 
method [Eqn. (f32b l or the white method [Eqn. {§)]. 
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FIG. 10. — Simulated transit observations of the "Hot Neptune" GJ 436. 
Arbitrary vertical offsets have been applied to the light curves, to separate 
them on the page. 
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White Method 



Epoch 



FIG. 11. — Transit timing variations estimated from simulated transit obser- 
vations of GJ 436b. Each panel shows the residuals (observed — calculated) 
of a linear fit to the estimated midtransit times. The midtransit times were 
estimated with a wavelet analysis and also with a white analysis, as described 
in the text. The dashed lines indicate the 1 <r errors in the linear model. 

Fig-EJdisplays the results of this analysis in the form of the 
posterior distribution for the case of t c , and the joint posterior 
confidence regions for the other cases. The wavelet method 
gives larger (and more appropriate) confidence regions than 
the white analysis. In accordance with our previous findings, 
the white analysis underestimates the error in t c and gives an 
estimate of t c that differs from the true value by more than 
lcr. The wavelet method gives better agreement. Both anal- 
yses give an estimate for R p /R* that is smaller than the true 
value of 0.15, but in the case of the white analysis, this shift 
is deemed significant, thereby ruling out the correct answer 
with more than 95% confidence. In the wavelet analysis, the 
true value of R p /R+ is well within the 68% confidence re- 
gion. Both the wavelet and white analyses give accurate val- 
ues of a /R* and the inclination, and the wavelet method re- 
ports larger errors. As shown in Fig. ( fT3b . the wavelet method 
was successful at identifying the parameters (a and <r w ) of the 
underlying 1 // noise process. 

Fig. [T2l shows the best-fitting transit model, and also illus- 
trat es th e action of the "whitening" filter that was described 
in § 13.41 The jagged line plotted over the upper time series is 
the best estimate of the 1// contribution to the noise, found 
by applying the whitening filter [Eqn. d29b l to the data using 
the estimated noise parameters. The lower time series is the 
whitened data, in which the 1 // component has been sub- 
tracted. Finally, in Fig. [14] we compare the estimated 1// 
noise component with the actual 1 // component used to gen- 
erate the data. Possibly, by isolating the correlated component 
in this way, and investigating its relation to other observable 
parameters, the physical origin of the noise could be identified 
and understood. 

5. SUMMARY AND DISCUSSION 

In this paper we have introduced a technique for parameter 
estimation based on fitting a parametric model to a time se- 
ries that may be contaminated by temporally correlated noise 
with a 1 IP power spectral density. The essence of the tech- 
nique is to calculate the likelihood function in a wavelet ba- 
sis. This is advantageous because a broad class of realistic 
noise processes produce a nearly diagonal covariance matrix 
in the wavelet basis, and because fast methods for computing 
wavelet transforms are available. We have tested and illus- 
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FIG. 12. — Wavelet analysis of a single simulated transit light curve. 
Top. — Simulated light curve with correlated noise. The jagged line is the 
best-fitting transit model plus the best-fitting model of the 1// component of 
the nois e. B ottom. — Simulated light curve after applying the whitening filter 
of Eqn. )29t , using the noise parameters estimated from the wavelet analysis. 
The solid line is the best-fitting transit model. 

trated this technique, and compared it to other techniques, us- 
ing numerical experiments involving simulated photometric 
observations of exoplanetary transits. 

For convenience we summarize the likelihood calculation 
here: 

• Given the N data points y(f,) obtained at evenly-spaced 
times tt, subtract the model /)(?,; p) with model param- 
eters pto form the N residuals r(f,) = y(?,) —f(ff, p). 

• If N is not a multiple of a power of two, either truncate 
the time series or enlarge it by padding it with zeros, 
until N = n 2 M for some n > 0, M > 0. 

• Apply the Fast Wavelet Transform (FWT) to the resid- 
uals to obtain «o(2 M - 1) wavelet coefficients rjj 1 and «o 
scaling coefficients f\. 

• For stationary, Gaussian noise built from an additive 
combination of uncorrelated and correlated noise (with 
Power Spectral Density S(f) oc l// 7 ), the likelihood 
for the residuals r(f,) is given by 




exp 



far 



21 



2a, 



where 



2aj 



a 2 w = a?2-~"" + *l 
a 2 s = a^g( 7 ) + al 



(41) 



(42) 
(43) 



for some noise parameters cr w > 0, a r > and g(j) = 
0(1) [e.g., 0.72]. 

The calculation entails the multiplication of terms and has 
an overall time-complexity of 0(N). With this prescription 



16 




1.64 
1.62 
1.60 
f 1.58 
•" 1.56 
1.54 

1.52 
1.50 





i i !■ i ■ ■ I i ■ i i 1 ■ i ■ ■ I i ■ i i 




















. . i .... i .... 1 .... i .... 



0.0048 



0.0046 



0.0044 



8.0 8.5 9.0 9.5 10.0 10.5 1 1 .0 
o/R. 




FIG. 13. — Results of parameter estimation for the simulated light curve of Fig. [12] Results for both the wavelet method (solid lines) and the white method 
(dashed lines) are compared. The upper left panel shows the posterior distribution for the midtransit time. The other panels show confidence contours (68.3% 
and 95.4%) of the joint posterior distribution of two parameters. The true parameter values are indicated by dotted lines. 
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FIG. 14. — Isolating the correlated component. Plotted are the actual and 
esti mated 1/f components of the noise in the simulated light curve plotted in 
Fig. i\2\. The estimated 1/f signal was found by applying the wavelet filter, 
Eqn. )29t . to the residuals. 

for the likelihood function, the parameters may be optimized 
using any number of traditional algorithms. For example, the 
likelihood may be used in the jump-transition probability in a 
Monte Carlo Markov Chain analysis, as we have done in this 
work. 

Among the premises of this technique are that the corre- 
lations among the wavelet and scaling coefficients are small 
enough to be negligible. In fact, the magnitude of the cor- 
relations at different scales and times are dependent on the 
choice of wavelet basis and the spectral index 7 describing 



the power spectral density of the correlated component of the 
noise. We have chosen for our experiments the Daubechies 
4th-order wavelet basis which seems well-suited to the cases 
we considered. A perhaps more serious limitation is that the 
noise should be stationary. Real noise is often nonstationary. 
For example, photometric observations are noisier during pe- 
riods of poor weather, and even in good conditions there may 
be more noise at the beginning or end of the night when the 
target is observed through the largest airmass. It is possible 
that this limitation could be overcome with more elaborate 
noise models, or by analyzing the time series in separate seg- 
ments; future work on these topics may be warranted. 

Apart from the utility of the wavelet method, we draw the 
following conclusions based on the numerical experiments of 
§ 3. First, any analysis that ignores possible correlated errors 
(a "white" analysis in our terminology) is suspect, and any 2- 
3cr results from such an analy sis shoul d be rega rded as provi- 
sional at best. As shown in §§ !4.11l4~2l and !4.6l even data that 
appear "good" on visual inspection and that are dominated by 
uncorrelated noise may give parameter errors that are under- 
estimated by a factor of 2-3 in a whit e analysis. Second, using 
any of the methods described in 14.41 (the wavelet method, the 
time-averaging method, or the residual-permutation method) 
is preferable to ignoring correlated noise altogether. 

Throughout this work our main application has been es- 
timation of the parameters of a single time series or a few 
such time series, especially determining the midtransit times 
of transit light curves. One potentially important application 
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that we have not discussed is the detection of transits in a 
database of time-series photometry of many stars. Photomet- 
ric surveys such as the ground-based HAT (Bakos et al. 2007) 
and SuperWASP (Pollacco et al. 2006), and space-based mis- 
sions such as Corot (Baglin et al. 2003) and Kepler (Borucki 
et al. 2003) produce tens to hundreds of thousands of time se- 
ries, spanning much longer intervals than the transit durations. 
It seems likely that the parameters of a noise model could be 
very well constrained using these vast databases, and that the 
application of a wavelet-based whitening filter could facili- 
tate the detection of transits and the elimination of statistical 
false positives. Popular techniques for dealing with correlated 



noise in large photometric databases are those of Tamuz et 
al. (2005), Kovacs et al. (2005), and Pont et al. (2006). A 
priority for future work is to compare these methods with a 
wavelet-based method, by experimenting with realistic survey 
data. 
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